Search CORE

5 research outputs found

Reliable state retention-based embedded processors through monitoring and recovery

Author: Al-Hashimi Bashir
Flynn David
Idgunji Sachin
Khursheed Syed Saqib
Yang Sheng
Publication venue
Publication date: 01/12/2011
Field of study

State retention power gating and voltage-scaled state retention are two effective design techniques, commonly employed in embedded processors, for reducing idle circuit leakage power. This paper presents a methodology for improving the reliability of embedded processors in the presence of power supply noise and soft errors. A key feature of the method is low cost, which is achieved through reuse of the scan chain for state monitoring, and it is effective because it can correct single and multiple bit errors through hardware and software respectively. To validate the methodology, ARM Cortex-M0 embedded microprocessor (provided by our industrial project partner) is implemented in FPGA and further synthesized using 65-nm technology to quantify the cost in terms of area, latency and energy. It is shown that the proposed methodology has a small area overhead (8.6%) with less than 4% worst-case increase in critical path and is capable of detecting and correcting both single bit and multi bit errors for a wide range of fault rates

University of Liverpool Repository

Southampton (e-Prints Soton)

MLPerf Inference Benchmark

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

arXiv.org e-Print Archive

Crossref

Scale-out processors

Author: Almutaz Adileh
Babak Falsafi
Boris Grot
Djordje Jevdjic
Emre Ozer
Javier Picorel
Michael Ferdman
Onur Kocberber
Pejman Lotfi-kamran
Sachin Idgunji
Stavros Volos
Publication venue
Publication date: 01/01/2012
Field of study

Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment. Emerging applications (e.g., data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated by on-die caches of existing server chips. Large caches reduce the die area available for cores and lower performance through long access latency when instructions are fetched. Performance on scale-out workloads is maximized through a modestly-sized last-level cache that captures the instruction footprint at the lowest possible access latency. In this work, we introduce a methodology for designing scalable and efficient scale-out server processors. Based on a metric of performance-density, we facilitate the design of optimal multi-core configurations, called pods. Each pod is a complete server that tightly couples a number of cores to a small last-level cache using a fast interconnect. Replicating the pod to fill the die area yields processors which have optimal performance density, leading to maximum per-chip throughput. Moreover, as each pod is a stand-alone server, scale-out processors avoid the expense of global (i.e., interpod) interconnect and coherence. These features synergistically maximize throughput, lower design complexity, and improve technology scalability. In 20nm technology, scaleout chips improve throughput by 5x-6.5x over conventional and by 1.6x-1.9x over emerging tiled organizations. 1

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

In brief

Author: Falsafi Babak
Hardy Damien
Idgunji Sachin
Jevdjic Djordje
Lotfi-Kamran Pejman
Milojevic Dragomir
Nicopoulos Chrysostomos
Ozer Emre
Panteli Andreas
Prodromou Andreas
Sazeides. Yiannakis
Publication venue
Publication date: 01/01/2012
Field of study

info:eu-repo/semantics/publishe

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Research Papers in Economics

DI-fusion

Scale-out processors

Author: Almutaz Adileh
Babak Falsafi
Baron M.
Boris Grot
Djordje Jevdjic
Emre Ozer
Hardavellas N.
Javier Picorel
Michael Ferdman
Onur Kocberber
Pejman Lotfi-Kamran
Sachin Idgunji
Stavros Volos
Turley J.
Wheeler B.
Zhao L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref